Exploring EVENTS

Screen%20Shot%202022-01-30%20at%2011.11.04.png

Experiments

    1. Visualising Events Dataframe
    1. Exploring Tags Events
    1. Calculating Events Description Similarity
    1. Calculating Events Description Topic Modelling
    1. Exploring the Schedules of Events
      • 5.1 Getting the Frequency of Starting Dates of Events Schedules
      • 5.2 Getting the Frequency of End Dates of Events Schedules
    1. Exploring the Performances Tickets of Events Schedules
      • 6.1 Getting the Frequency of Price Tickets
      • 6.2 Getting the frequency of type (Standard, Children) tickets
      • 6.3 Exploring Performances Places - ATENTION: Merging information with "places" dataframe!
        • 6.3.1 Frequency of Performances per town
        • 6.3.2 Frequency of Type tickets per town
        • 6.3.3 Frequency of Price tickets type per town
        • 6.3.4 Frequency of Max_Price tickets per town
          • 6.3.4.1 Frequency of Free tickets per town
          • 6.3.4.2 Frequency of No Free tickets per town
      • 6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews
        • 6.4.1 Frequency of Price Tickets per Scottish City
        • 6.4.2 Frequency of Type Tickets per Scottish City
        • 6.4.3 Frequency of Schedules Dates per Event and per Scottish City
        • 6.4.4.Grouping Schedules per Event and Scottish City
        • 6.4.5 Exploring Tags per Schedule and Scottish Cities
          • 6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh
          • 6.4.5.2 Exploring the Frequency of schedules tags for Glasgow
        • 6.4.6 Histograms of starting/end schedules dates for Edinburgh
        • 6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time
          • 6.4.7.1 Frequency of schedules Starting Date in Scottish City
          • 6.4.7.2 Frequency of schedules Ending Date in Scottish City
          • 6.4.7.3 Scheduled tags and Starting Dates in Scottish City
          • 6.4.7.4 Scheduled tags and Starting Dates in Scottish City

0. Importing libraries and loading the json file with 5000 events to a dataframe

In [129]:
import json
import pandas as pd
import plotly.express as px
import os
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
import plotly.graph_objects as go
import numpy as np
from gensim.parsing.preprocessing import remove_stopwords
import re
In [2]:
with open('dataset/sample_20200501.json', 'r') as f:
    data = json.load(f)
    print(len(data["events"]))
    events=data["events"]
df = pd.DataFrame(events)
2282

1. Visualizing the events dataframe

In [3]:
df
Out[3]:
event_id modified_ts created_ts name sort_name status id schedules descriptions tags category properties ranking_level ranking_in_level website phone_numbers alternative_names
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2020-06-23T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Days out, Glasgow City of Science, Sc... Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
1 347164 2020-03-23T07:05:08Z 2013-03-18T13:05:44Z The Saturday Show Saturday Show live 347164 [{'start_ts': '2020-05-02T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, The Saturday Show] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
2 347313 2020-03-24T07:05:11Z 2013-03-21T12:44:28Z The Sunday Night Laugh-In Sunday Night Laugh-In live 347313 [{'start_ts': '2020-05-10T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, Sunday Night Laugh-In] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
3 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'start_ts': '2020-05-08T19:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 1 http://thepublandlord.com/ NaN NaN
4 693673 2021-12-20T05:46:45Z 2021-12-20T05:46:45Z Matt Forde Matt Forde live 693673 [{'start_ts': '2020-05-30T18:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Comedy] Comedy {'dropin_event': False, 'booking_essential': F... 2 2 http://www.shermantheatre.co.uk/performance/co... NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2277 1586593 2021-05-06T07:20:00Z 2020-08-03T14:08:11Z Loch Ness and the Highlands of Scotland Tour Loch Ness and the Highlands of Scotland Tour live 1586593 [{'start_ts': '2020-08-12T08:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Days out, History, Nature, Storytelling, Walk... Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN {'info': '0131 555 5558'} NaN
2278 1595055 2020-09-08T10:53:48Z 2020-09-06T21:24:58Z Black History Walking Tour of Edinburgh Black History Walking Tour of Edinburgh live 1595055 [{'start_ts': '2020-09-12T10:30:00+01:00', 'en... [{'type': 'description.official', 'description... [Days out, History, Tours, Walking tour, Walks] Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN NaN
2279 1599103 2020-09-24T14:24:20Z 2020-09-22T12:14:45Z Pumpkin Picking Pumpkin Picking live 1599103 [{'start_ts': '2020-10-16T09:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Activities, Days out, Food & Drink] Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN {'info': '07793 600 289'} NaN
2280 1603922 2020-10-14T15:45:11Z 2020-10-12T11:27:13Z Mad Hatters Afternoon Tea Mad Hatters Afternoon Tea live 1603922 [{'start_ts': '2020-10-25T14:00:00+00:00', 'en... [{'type': 'description.list.default', 'descrip... [Days out, Food & Drink] Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN {'info': '0131 333 0131'} NaN
2281 1606877 2020-10-29T10:45:10Z 2020-10-23T15:37:48Z Mana Poké: Leith Pop-Up Mana Poké: Leith Pop-Up live 1606877 [{'start_ts': '2020-10-16T11:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... [Days out, Food & Drink] Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN

2282 rows × 17 columns

In [4]:
## selecting some columns

Experiment 2: Exploring Tags Events

We are going to separete the elements stored in each tag list into new rows.

In [5]:
df["tags"][0:5]
Out[5]:
0    [Comedy, Days out, Glasgow City of Science, Sc...
1                [Comedy, Stand-up, The Saturday Show]
2            [Comedy, Stand-up, Sunday Night Laugh-In]
3                                   [Comedy, Stand-up]
4                                             [Comedy]
Name: tags, dtype: object
In [6]:
df_tags=df.explode('tags')
In [7]:
df_tags
Out[7]:
event_id modified_ts created_ts name sort_name status id schedules descriptions tags category properties ranking_level ranking_in_level website phone_numbers alternative_names
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2020-06-23T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Comedy Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2020-06-23T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Days out Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2020-06-23T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Glasgow City of Science Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2020-06-23T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Science Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'start_ts': '2020-06-23T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Stand-up Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2279 1599103 2020-09-24T14:24:20Z 2020-09-22T12:14:45Z Pumpkin Picking Pumpkin Picking live 1599103 [{'start_ts': '2020-10-16T09:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Food & Drink Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN {'info': '07793 600 289'} NaN
2280 1603922 2020-10-14T15:45:11Z 2020-10-12T11:27:13Z Mad Hatters Afternoon Tea Mad Hatters Afternoon Tea live 1603922 [{'start_ts': '2020-10-25T14:00:00+00:00', 'en... [{'type': 'description.list.default', 'descrip... Days out Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN {'info': '0131 333 0131'} NaN
2280 1603922 2020-10-14T15:45:11Z 2020-10-12T11:27:13Z Mad Hatters Afternoon Tea Mad Hatters Afternoon Tea live 1603922 [{'start_ts': '2020-10-25T14:00:00+00:00', 'en... [{'type': 'description.list.default', 'descrip... Food & Drink Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN {'info': '0131 333 0131'} NaN
2281 1606877 2020-10-29T10:45:10Z 2020-10-23T15:37:48Z Mana Poké: Leith Pop-Up Mana Poké: Leith Pop-Up live 1606877 [{'start_ts': '2020-10-16T11:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Days out Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN
2281 1606877 2020-10-29T10:45:10Z 2020-10-23T15:37:48Z Mana Poké: Leith Pop-Up Mana Poké: Leith Pop-Up live 1606877 [{'start_ts': '2020-10-16T11:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... Food & Drink Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN NaN

6177 rows × 17 columns

In [8]:
g_tags=df_tags.groupby(['tags']).size().reset_index()
g_tags=g_tags.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
g_tags
Out[8]:
tags number_of_times
285 Music 643
166 Film 363
88 Comedy 356
115 Days out 347
441 Theatre 321
... ... ...
235 Interactive 1
233 Innerleithen Music Festival 1
225 Hip Hop/Rap 1
222 Highland Games 1
498 wilderness skills 1

499 rows × 2 columns

In [9]:
fig = px.line(g_tags, x="tags", y="number_of_times", title='Number of times that each tag appears')
fig.show()

Experiment 3: Description Similarity

Exploding the column description

Given a description cell, with a list of descriptions, we will create new row per element in that list.

In [10]:
df["descriptions"][0:5]
Out[10]:
0    [{'type': 'description.list.default', 'descrip...
1    [{'type': 'description.list.default', 'descrip...
2    [{'type': 'description.list.default', 'descrip...
3    [{'type': 'description.list.default', 'descrip...
4    [{'type': 'description.list.default', 'descrip...
Name: descriptions, dtype: object
In [11]:
df_descriptions=df.explode('descriptions')
In [12]:
df_d=pd.concat([df_descriptions.drop(['descriptions'], axis=1), df_descriptions['descriptions'].apply(pd.Series)], axis=1)
In [13]:
df_desc=df_d[["event_id", "description"]]
In [14]:
df_desc
Out[14]:
event_id description
0 232545 Hardworking staff of the city's universities a...
0 232545 Bright Club's unique blend of comedy and acade...
1 347164 Saturday nights à la Stand are normally a sold...
1 347164 Our fabulous flagship Saturday show has been s...
2 347313 End the week with the generally very chilled S...
... ... ...
2278 1595055 The tour will be led by Lisa Williams, directo...
2279 1599103 Pull on your wellies, wrap up warm and come pi...
2280 1603922 An Alice in Wonderland-themed afternoon tea, c...
2281 1606877 Try the traditional Hawaiian poké dish at Mana...
2281 1606877 Tired of your hoose?\n\nYou’re welcome at ours...

3770 rows × 2 columns

Finding similar descriptions events - Deep Learning - Transformers

In [15]:
# remving the rows which description is empty
df_desc1=df_desc.dropna(subset=['description']).reset_index()
In [16]:
df_desc1[0:5]
Out[16]:
index event_id description
0 0 232545 Hardworking staff of the city's universities a...
1 0 232545 Bright Club's unique blend of comedy and acade...
2 1 347164 Saturday nights à la Stand are normally a sold...
3 1 347164 Our fabulous flagship Saturday show has been s...
4 2 347313 End the week with the generally very chilled S...
In [17]:
# total number of rows with descriptions
df_desc1.shape[0]
Out[17]:
3758
In [18]:
#selecting the description colum
documents=df_desc1["description"].values
In [20]:
#d=documents[0:100]
In [132]:
def clean_documents(text):
    text = re.sub(r'\S*@\S*\s?', '', text, flags=re.MULTILINE) # remove email
    text = re.sub(r'http\S+', '', text, flags=re.MULTILINE) # remove web addresses
    text = re.sub("\'", "", text) # remove single quotes
    text = remove_stopwords(text)
    return text
Going to store cleanned documents in d
In [133]:
d=[]
for text in documents:
    d.append(clean_documents(text))
In [134]:
# Using all-MiniLM-L6-v2 Transformer
model = SentenceTransformer('all-MiniLM-L6-v2')
In [135]:
#Training our text_embeddings - using the descriptions available & all-MiniLM-L6-v2 Transformer
text_embeddings = model.encode(d, batch_size = 8, show_progress_bar = True)

In [136]:
np.shape(text_embeddings)
Out[136]:
(3758, 384)
In [137]:
### A small example how to get an embedding vector from a description
In [138]:
first_description=df_desc1["description"].iloc[0]
first_description
first_description_embedding= model.encode(first_description, batch_size = 8, show_progress_bar = True)

Finding the similarity between documents

In [139]:
similarity_def=cosine_similarity(
    [first_description_embedding],
    text_embeddings)
In [140]:
similarities = cosine_similarity(text_embeddings)
print('pairwise dense output:\n {}\n'.format(similarities))
pairwise dense output:
 [[0.99999994 0.51764107 0.17253453 ... 0.14323813 0.10087886 0.0988165 ]
 [0.51764107 1.0000002  0.1672174  ... 0.23244801 0.02874976 0.04988466]
 [0.17253453 0.1672174  1.         ... 0.25728577 0.40647948 0.3765323 ]
 ...
 [0.14323813 0.23244801 0.25728577 ... 1.0000002  0.22144413 0.21084985]
 [0.10087886 0.02874976 0.40647948 ... 0.22144413 1.         0.6258373 ]
 [0.0988165  0.04988466 0.3765323  ... 0.21084985 0.6258373  1.        ]]

In [141]:
similarities_sorted = similarities.argsort()
similarities_sorted
Out[141]:
array([[1888,  315,  314, ..., 3353,    1,    0],
       [1993, 1168, 1646, ..., 3555, 3578,    1],
       [2602, 1021, 1022, ...,   31, 3390,    2],
       ...,
       [ 914, 1027, 1028, ..., 3347, 3486, 3755],
       [1014, 1013,  726, ..., 2857, 3757, 3756],
       [ 437, 1014, 1013, ..., 2953, 3756, 3757]])
In [142]:
id_1 = []
id_2 = []
score = []
for index,array in enumerate(similarities_sorted):
    p=len(array)
    id_1.append(index)
    id_2.append(array[-2])
    score.append(similarities[index][array[-2]])
index_df = pd.DataFrame({'id_1' : id_1,
                          'id_2' : id_2,
                          'score' : score})
print(p)
3758
In [143]:
index_df
Out[143]:
id_1 id_2 score
0 0 1 0.517641
1 1 3578 0.651061
2 2 3390 0.545395
3 3 31 0.687042
4 4 3059 0.493677
... ... ... ...
3753 3753 3752 1.000000
3754 3754 2254 0.418068
3755 3755 3486 0.557672
3756 3756 3757 0.625837
3757 3757 3756 0.625837

3758 rows × 3 columns

Finding the first 10 similar definitions given the document 3

In [144]:
## Lets take the document 3
doc_index =3
documents[3]
Out[144]:
"Our fabulous flagship Saturday show has been sold out every week for the past five years. And no wonder- it's the best night out in Edinburgh. Five different acts on every bill including our top drawer resident comperes and the best headliners from the UK and abroad.  And we're open from 7pm so you can take advantage of our lovely home-cooked food too."
In [145]:
results={}
for i in range(-2, -12, -1):
    similar_index=similarities_sorted[doc_index][i]
    rank=similarities[doc_index][similar_index]
    results[similar_index]=[rank]
In [146]:
results
Out[146]:
{31: [0.6870422],
 30: [0.6870422],
 11: [0.62863076],
 10: [0.62863076],
 3356: [0.5912417],
 3060: [0.559214],
 752: [0.54171896],
 751: [0.54171896],
 744: [0.54105425],
 743: [0.54105425]}

Experiment 4: Description Topic Modelling - Deep Learning - BERTopic

Lets find the topic modelling of our descriptions We are going to use the text_embeddings calculated in the previous phase.

In [147]:
len(documents)
Out[147]:
3758
Going to use the cleanned documents stored in "d"
In [148]:
topic_model = BERTopic(min_topic_size=20).fit(d, text_embeddings)
In [149]:
topics, probs = topic_model.transform(d, text_embeddings)

Visualizing our topics

In [150]:
topic_model.visualize_topics()
In [151]:
#### Visualzing the first 5 keywords of our first 5 topics
In [152]:
topic_model.visualize_barchart()

Visualizing the similarity between topics

In [153]:
topic_model.visualize_heatmap()

Getting the frequency of each topic.

We should always ignore the first -1 topic.

In [154]:
#Lets see the frequency of the first 10 topics
topic_model.get_topic_freq()[0:10]
Out[154]:
Topic Count
0 0 932
1 1 801
2 -1 562
3 2 249
4 3 233
5 4 107
6 5 99
7 6 97
8 7 87
9 8 70
In [155]:
print("Number of topics found %s" %len(topic_model.get_topic_freq()))
Number of topics found 25

Visualizing the keywords of our topics.

In [160]:
#topic_model.get_topics()
In [161]:
document_2_topic=topics[2]
print("The topic of the document 2 is %s " %document_2_topic)
The topic of the document 2 is 7 
In [162]:
topic_model.get_topic(7)
Out[162]:
[('food', 0.060557469275690036),
 ('wines', 0.04789689523252818),
 ('wine', 0.030598286904873512),
 ('market', 0.030016429102301713),
 ('tea', 0.029753079984174157),
 ('tasting', 0.029261193594240037),
 ('producers', 0.02754705756088448),
 ('vegan', 0.0266125208866976),
 ('beer', 0.02478806472585091),
 ('drink', 0.022996005628947342)]
In [163]:
df_desc1["description"].iloc[7]
Out[163]:
'Citizens of Hope and Glory! Our new tomorrow beckons. A new tomorrow that smells reassuringly of yesterday, but with WiFi.\n\nOne last heave and we will be there. And when the going gets tough the tough get going. Although the going won’t get tough and anyone who tells you it is going to be tough is lying.\n\nCome join me to marvel at the majesty of our green and pleasant land. A land just off the coast of France, except now it feels a little bit further away.\n\nJoin me to step backwards into the future.'
In [49]:
topic_model.get_topic(document_3_topic)
Out[49]:
[('and', 0.036770205053373195),
 ('of', 0.03596868601318236),
 ('to', 0.0353321012642645),
 ('the', 0.03323664144688808),
 ('in', 0.03025119623083119),
 ('with', 0.023732713304853305),
 ('is', 0.021970336866618097),
 ('for', 0.020736733092193847),
 ('on', 0.02047223163289919),
 ('from', 0.018326986408788923)]

Experiment 5: Exploring the Schedules of Events

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the schedules column

In [50]:
df["schedules"]
Out[50]:
0       [{'start_ts': '2020-06-23T20:30:00+01:00', 'en...
1       [{'start_ts': '2020-05-02T20:30:00+01:00', 'en...
2       [{'start_ts': '2020-05-10T20:30:00+01:00', 'en...
3       [{'start_ts': '2020-05-08T19:30:00+01:00', 'en...
4       [{'start_ts': '2020-05-30T18:00:00+01:00', 'en...
                              ...                        
2277    [{'start_ts': '2020-08-12T08:00:00+01:00', 'en...
2278    [{'start_ts': '2020-09-12T10:30:00+01:00', 'en...
2279    [{'start_ts': '2020-10-16T09:00:00+01:00', 'en...
2280    [{'start_ts': '2020-10-25T14:00:00+00:00', 'en...
2281    [{'start_ts': '2020-10-16T11:00:00+01:00', 'en...
Name: schedules, Length: 2282, dtype: object
In [51]:
df_schedules=df
df_schedules.rename(columns={'tags':'event_tags'}, inplace=True)
df_schedules.rename(columns={'name':'event_name'}, inplace=True)
df_schedules.rename(columns={'links':'event_links'}, inplace=True)
df_schedules=df.explode('schedules')
#df_schedules
df_s=pd.concat([df_schedules.drop(['schedules'], axis=1), df_schedules['schedules'].apply(pd.Series)], axis=1)
In [52]:
df_s.iloc[0]
Out[52]:
event_id                                                        232545
modified_ts                                       2020-03-15T12:18:05Z
created_ts                                        2011-07-14T15:03:56Z
event_name                                                 Bright Club
sort_name                                                  Bright Club
status                                                            live
id                                                              232545
descriptions         [{'type': 'description.list.default', 'descrip...
event_tags           [Comedy, Days out, Glasgow City of Science, Sc...
category                                                        Comedy
properties           {'dropin_event': False, 'booking_essential': F...
ranking_level                                                        2
ranking_in_level                                                     1
website                                                            NaN
phone_numbers                                                      NaN
alternative_names                                                  NaN
start_ts                                     2020-06-23T20:30:00+01:00
end_ts                                       2020-06-23T20:30:00+01:00
place_id                                                             1
performances         [{'ts': '2020-06-23T20:30:00+01:00', 'duration...
performance_space                                                  NaN
phone_numbers                                                      NaN
Name: 0, dtype: object

Getting the Frequency of Starting Dates of Events Schedules

In [53]:
df_start=df_s.groupby([pd.to_datetime(df_s['start_ts'])]).size().reset_index()
df_start=df_start.rename(columns={0: "number_of_times"})
df_start=df_start.sort_values(by=['number_of_times'], ascending=False)
df_start.reset_index()
Out[53]:
index start_ts number_of_times
0 4 2020-05-01 10:00:00+01:00 46
1 0 2020-05-01 00:00:00+01:00 17
2 97 2020-05-08 19:30:00+01:00 11
3 1289 2020-09-19 19:00:00+01:00 9
4 38 2020-05-02 20:00:00+01:00 8
... ... ... ...
1642 733 2020-08-06 17:15:00+01:00 1
1643 732 2020-08-06 17:10:00+01:00 1
1644 731 2020-08-06 17:05:00+01:00 1
1645 730 2020-08-06 16:00:00+01:00 1
1646 823 2020-08-07 21:10:00+01:00 1

1647 rows × 3 columns

Visualizing the previous Start_Ts Schedules Events Freq.

In [55]:
fig = px.histogram(df_start, x='start_ts', y="number_of_times", title="Frequency of Starts Dates Schedules")
fig.show()

Getting the Frequency of End Dates of Events Schedules

In [56]:
df_end=df_s.groupby([pd.to_datetime(df_s['end_ts'])]).size().reset_index()
df_end=df_end.rename(columns={0: "number_of_times"})
df_end=df_end.sort_values(by=['number_of_times'], ascending=False)
df_end.reset_index()
fig = px.histogram(df_end, x='end_ts', y="number_of_times", title="Frequency of End Dates Schedules")
fig.show()

Experiment 6: Exploring the Performances Tickets of Events Schedules

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the performance column. We can not explode the performance column, if we hadnt have exploded the schedules column before. For that reason, we are using df_s dataframe, which has already exploded the schedules column.

In [57]:
df_s
Out[57]:
event_id modified_ts created_ts event_name sort_name status id descriptions event_tags category ... ranking_in_level website phone_numbers alternative_names start_ts end_ts place_id performances performance_space phone_numbers
0 232545 2020-03-15T12:18:05Z 2011-07-14T15:03:56Z Bright Club Bright Club live 232545 [{'type': 'description.list.default', 'descrip... [Comedy, Days out, Glasgow City of Science, Sc... Comedy ... 1 NaN NaN NaN 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00 1 [{'ts': '2020-06-23T20:30:00+01:00', 'duration... NaN NaN
1 347164 2020-03-23T07:05:08Z 2013-03-18T13:05:44Z The Saturday Show Saturday Show live 347164 [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, The Saturday Show] Comedy ... 2 NaN NaN NaN 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 1 [{'ts': '2020-05-02T20:30:00+01:00', 'duration... NaN NaN
2 347313 2020-03-24T07:05:11Z 2013-03-21T12:44:28Z The Sunday Night Laugh-In Sunday Night Laugh-In live 347313 [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up, Sunday Night Laugh-In] Comedy ... 2 NaN NaN NaN 2020-05-10T20:30:00+01:00 2020-07-26T20:30:00+01:00 1 [{'ts': '2020-06-21T20:30:00+01:00', 'duration... NaN NaN
3 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'type': 'description.list.default', 'descrip... [Comedy, Stand-up] Comedy ... 1 http://thepublandlord.com/ NaN NaN 2020-05-08T19:30:00+01:00 2020-09-18T19:30:00+01:00 22978 [{'ts': '2020-05-08T19:30:00+01:00', 'links': ... NaN NaN
4 693673 2021-12-20T05:46:45Z 2021-12-20T05:46:45Z Matt Forde Matt Forde live 693673 [{'type': 'description.list.default', 'descrip... [Comedy] Comedy ... 2 http://www.shermantheatre.co.uk/performance/co... NaN NaN 2020-05-30T18:00:00+01:00 2020-05-31T20:30:00+01:00 1 [{'ts': '2020-05-30T18:00:00+01:00', 'links': ... NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2277 1586593 2021-05-06T07:20:00Z 2020-08-03T14:08:11Z Loch Ness and the Highlands of Scotland Tour Loch Ness and the Highlands of Scotland Tour live 1586593 [{'type': 'description.list.default', 'descrip... [Days out, History, Nature, Storytelling, Walk... Days out ... 1 NaN {'info': '0131 555 5558'} NaN 2020-08-12T08:00:00+01:00 2020-10-31T08:00:00+00:00 127571 [{'ts': '2020-08-12T08:00:00+01:00', 'duration... NaN NaN
2278 1595055 2020-09-08T10:53:48Z 2020-09-06T21:24:58Z Black History Walking Tour of Edinburgh Black History Walking Tour of Edinburgh live 1595055 [{'type': 'description.official', 'description... [Days out, History, Tours, Walking tour, Walks] Days out ... 2 NaN NaN NaN 2020-09-12T10:30:00+01:00 2020-10-24T10:30:00+01:00 127985 [{'ts': '2020-09-12T10:30:00+01:00', 'duration... NaN NaN
2279 1599103 2020-09-24T14:24:20Z 2020-09-22T12:14:45Z Pumpkin Picking Pumpkin Picking live 1599103 [{'type': 'description.list.default', 'descrip... [Activities, Days out, Food & Drink] Days out ... 2 NaN {'info': '07793 600 289'} NaN 2020-10-16T09:00:00+01:00 2020-10-19T09:00:00+01:00 128231 [{'ts': '2020-10-16T09:00:00+01:00', 'duration... NaN NaN
2280 1603922 2020-10-14T15:45:11Z 2020-10-12T11:27:13Z Mad Hatters Afternoon Tea Mad Hatters Afternoon Tea live 1603922 [{'type': 'description.list.default', 'descrip... [Days out, Food & Drink] Days out ... 2 NaN {'info': '0131 333 0131'} NaN 2020-10-25T14:00:00+00:00 2020-10-25T14:00:00+00:00 128392 [{'ts': '2020-10-25T14:00:00+00:00', 'duration... NaN NaN
2281 1606877 2020-10-29T10:45:10Z 2020-10-23T15:37:48Z Mana Poké: Leith Pop-Up Mana Poké: Leith Pop-Up live 1606877 [{'type': 'description.list.default', 'descrip... [Days out, Food & Drink] Days out ... 1 NaN NaN NaN 2020-10-16T11:00:00+01:00 2020-10-31T11:00:00+00:00 128456 [{'ts': '2020-10-16T11:00:00+01:00', 'duration... NaN NaN

2542 rows × 22 columns

In [58]:
a=df_s[["event_id", "event_name", "performances", "event_tags", "start_ts", "end_ts", "place_id"]]
df_p=a.explode("performances")
In [59]:
df_p
Out[59]:
event_id event_name performances event_tags start_ts end_ts place_id
0 232545 Bright Club {'ts': '2020-06-23T20:30:00+01:00', 'duration'... [Comedy, Days out, Glasgow City of Science, Sc... 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00 1
1 347164 The Saturday Show {'ts': '2020-05-02T20:30:00+01:00', 'duration'... [Comedy, Stand-up, The Saturday Show] 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 1
1 347164 The Saturday Show {'ts': '2020-05-09T20:30:00+01:00', 'duration'... [Comedy, Stand-up, The Saturday Show] 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 1
1 347164 The Saturday Show {'ts': '2020-05-16T20:30:00+01:00', 'duration'... [Comedy, Stand-up, The Saturday Show] 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 1
1 347164 The Saturday Show {'ts': '2020-05-23T20:30:00+01:00', 'duration'... [Comedy, Stand-up, The Saturday Show] 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 1
... ... ... ... ... ... ... ...
2281 1606877 Mana Poké: Leith Pop-Up {'ts': '2020-10-27T11:00:00+00:00', 'duration'... [Days out, Food & Drink] 2020-10-16T11:00:00+01:00 2020-10-31T11:00:00+00:00 128456
2281 1606877 Mana Poké: Leith Pop-Up {'ts': '2020-10-28T11:00:00+00:00', 'duration'... [Days out, Food & Drink] 2020-10-16T11:00:00+01:00 2020-10-31T11:00:00+00:00 128456
2281 1606877 Mana Poké: Leith Pop-Up {'ts': '2020-10-29T11:00:00+00:00', 'duration'... [Days out, Food & Drink] 2020-10-16T11:00:00+01:00 2020-10-31T11:00:00+00:00 128456
2281 1606877 Mana Poké: Leith Pop-Up {'ts': '2020-10-30T11:00:00+00:00', 'duration'... [Days out, Food & Drink] 2020-10-16T11:00:00+01:00 2020-10-31T11:00:00+00:00 128456
2281 1606877 Mana Poké: Leith Pop-Up {'ts': '2020-10-31T11:00:00+00:00', 'duration'... [Days out, Food & Drink] 2020-10-16T11:00:00+01:00 2020-10-31T11:00:00+00:00 128456

26025 rows × 7 columns

In [60]:
df_p=pd.concat([df_p.drop(['performances'], axis=1), df_p['performances'].apply(pd.Series)], axis=1)
In [61]:
df_p[0:2]
Out[61]:
event_id event_name event_tags start_ts end_ts place_id ts duration links tickets properties descriptions time_unknown
0 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00 1 2020-06-23T20:30:00+01:00 120.0 [{'type': 'booking', 'url': 'https://www.thest... [{'type': 'Standard', 'currency': 'GBP', 'min_... NaN NaN NaN
1 347164 The Saturday Show [Comedy, Stand-up, The Saturday Show] 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 1 2020-05-02T20:30:00+01:00 135.0 [{'type': 'booking', 'url': 'https://www.thest... [{'type': 'Standard', 'currency': 'GBP', 'min_... {'performance.cancelled': True} [{'type': 'list.description.default', 'descrip... NaN

Exploring tickets

Now we have to explode the tickets column. We are going to remove the rows which tickets information is empty.

In [62]:
df_p=df_p.dropna(subset=['tickets'])

Since we dont need all the columns, we have selects a few of them.

In [63]:
df_t=df_p[["event_id", "event_name", "descriptions", "event_tags", "tickets", "place_id", "start_ts", "end_ts"]]
In [64]:
df_t[0:5]
Out[64]:
event_id event_name descriptions event_tags tickets place_id start_ts end_ts
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00
In [65]:
df_t1=df_t.explode("tickets")

Now we are going to transform the max, and min prices of tickets to numeric values.

In [66]:
df_tickets=pd.concat([df_t1.drop(['tickets'], axis=1), df_t1['tickets'].apply(pd.Series)], axis=1)
df_tickets['min_price'] = pd.to_numeric(df_tickets['min_price'])
df_tickets['max_price'] = pd.to_numeric(df_tickets['max_price'])
df_tickets['min_price']= df_tickets['min_price'].fillna(0)
df_tickets['max_price']= df_tickets['max_price'].fillna(0)
In [67]:
df_tickets[0:5]
Out[67]:
event_id event_name descriptions event_tags place_id start_ts end_ts 0 currency description max_price min_price type
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... 1 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00 NaN GBP NaN 0.0 5.0 Standard
0 232545 Bright Club NaN [Comedy, Days out, Glasgow City of Science, Sc... 1 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00 NaN GBP NaN 0.0 5.0 Members
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 NaN GBP NaN 0.0 17.5 Standard
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 NaN GBP NaN 0.0 17.5 Standard
1 347164 The Saturday Show [{'type': 'list.description.default', 'descrip... [Comedy, Stand-up, The Saturday Show] 1 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00 NaN GBP NaN 0.0 17.5 Standard

Experiment 6.1: Getting the Frequency of Price Tickets

We are working just with max_price.

In [68]:
g_maxp=df_tickets.groupby(['max_price']).size().reset_index()
g_maxp=g_maxp.rename(columns={0: "number_of_times"})
#g_maxp=g_maxp.sort_values(by=['number_of_times'], ascending=False)
free_tickets=g_maxp[0:1]
## Removing FREE TICKETS
g_maxp=g_maxp.drop([0])
### 
g_maxp[:]
Out[68]:
max_price number_of_times
1 5.0 4
2 6.0 4
3 7.0 140
4 7.7 1
5 8.0 51
... ... ...
102 199.0 1
103 800.0 6
104 841.5 1
105 899.0 1
106 1000.0 1

106 rows × 2 columns

In [69]:
fig = px.line(g_maxp, x="max_price", y="number_of_times", title='Frequency of price tickets')
fig.show()
In [70]:
print("The number of free tickets is: %s" %free_tickets["number_of_times"].values[0])
The number of free tickets is: 35822

Experiment 6.2: Getting the frequency of type (Standard, Children) tickets

In [71]:
tickets_type=df_tickets.groupby(['type']).size().reset_index()
tickets_type=tickets_type.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
tickets_type
Out[71]:
type number_of_times
15 Standard 23272
5 Concession 7290
3 Children 3802
9 Members 100
12 Seniors 40
4 Children 10+ 27
7 Family 24
11 Preview 23
16 Students 18
33 under 5s 7
2 Before 12am 6
19 after 6
18 Weekend Ticket 4
31 under 3s 4
30 under 17s/disabled 3
34 under 6s 3
23 festival only 2
22 festival & riding 2
20 ages 12 and under 2
17 VIP Tribe Ticket 2
14 Short Course Youth 2
13 Short Course 2
8 Full Course 2
6 Engaged Couple Goody Bag Ticket 2
1 Ages 6--15 1
21 camping ticket 1
24 pair 1
25 parking 1
26 past members 1
27 trio 1
28 under 10s 1
29 under 13s 1
10 Online Sesion 1
32 under 4s 1
0 7--17s with adult 1
In [72]:
px.histogram(tickets_type, x="type", y="number_of_times", histfunc="sum", color="type", title='Frequency of type tickets')

6.3 Exploring Performances Places

In [73]:
df_tickets["place_id"]
Out[73]:
0            1
0            1
1            1
1            1
1            1
         ...  
2281    128456
2281    128456
2281    128456
2281    128456
2281    128456
Name: place_id, Length: 37370, dtype: int64

Creating places dataframe

In [74]:
data="dataset/sample_20180501.json"
with open('dataset/sample_20180501.json', 'r') as f:
    data = json.load(f)
    print(len(data["places"]))
    places=data["places"]
df_places = pd.DataFrame(places)
1224
In [75]:
df_place = df_tickets.merge(df_places, on=['place_id','place_id'])
In [76]:
df_place.shape[0]
Out[76]:
32292

6.3.1 Frequency of Performances per Town

In [77]:
df_town=df_place.dropna(subset=['town'])
town=df_town.groupby(['town']).size().reset_index()
town=town.rename(columns={0: "number_of_times"})
town=town.drop([0])
In [78]:
town=town.sort_values(by=['number_of_times'], ascending=False)
town
Out[78]:
town number_of_times
25 Edinburgh 24020
14 Coldstream 928
65 St Andrews 725
17 Crossford 674
49 Melrose 640
... ... ...
19 Dalkeith 1
62 Selkirk 1
26 Falkland 1
59 Prestonpans 1
60 Pumpherston 1

71 rows × 2 columns

In [79]:
px.scatter(town, x="town",y='number_of_times', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of Performances per Town")

6.3.2 Frequency of Type tickets per town

In [80]:
town_type=df_town.groupby(['town', 'type']).size().reset_index()
town_type=town_type.rename(columns={0: "number_of_times"})
town_type=town_type[town_type["town"]!=""]
In [81]:
town_type=town_type.sort_values(by=['number_of_times'], ascending=False)
town_type
Out[81]:
town type number_of_times
52 Edinburgh Standard 15077
45 Edinburgh Concession 6166
146 St Andrews Standard 563
43 Edinburgh Children 537
20 Coldstream Children 464
... ... ... ...
69 Gifford Standard 1
30 Dalkeith Standard 1
137 Selkirk Standard 1
57 Edinburgh under 10s 1
160 Yetholm Standard 1

159 rows × 3 columns

In [82]:
fig = px.scatter(town_type, x='town', y='type', color='number_of_times', title="Frequency of type tickets per town")
fig.show()
In [83]:
px.scatter(town_type, x="town",y='type', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of performances type tickets per town")

6.3.3. Frequency of Max_Price tickets per towns

In [84]:
a=df_town[["town", "max_price"]]
a=a[a["town"]!=""]
town_price=a.groupby(['town', 'max_price']).size().reset_index()
town_price=town_price.rename(columns={0: "number_of_times"})
town_price=town_price.sort_values(by=['number_of_times'], ascending=False)
town_price
Out[84]:
town max_price number_of_times
29 Edinburgh 0.00 23318
16 Coldstream 0.00 928
176 St Andrews 0.00 725
21 Crossford 0.00 674
157 Melrose 0.00 640
... ... ... ...
82 Edinburgh 41.25 1
83 Edinburgh 41.80 1
85 Edinburgh 43.45 1
130 Glentress 71.00 1
92 Edinburgh 54.00 1

184 rows × 3 columns

6.3.3.1. Frequency of free tickets per town

In [85]:
free_town_price=town_price[town_price["max_price"]== 0.0]
free_town_price
Out[85]:
town max_price number_of_times
29 Edinburgh 0.0 23318
16 Coldstream 0.0 928
176 St Andrews 0.0 725
21 Crossford 0.0 674
157 Melrose 0.0 640
... ... ... ...
168 Prestonpans 0.0 1
169 Pumpherston 0.0 1
171 Selkirk 0.0 1
13 Burntisland 0.0 1
143 Lauder 0.0 1

68 rows × 3 columns

In [86]:
fig = px.bar(free_town_price, x='town', y='number_of_times', color='number_of_times', barmode='group', title="Frequency of Free Tickets per Town")
fig.show()

6.3.3.1. Frequency of No free tickets per town

In [87]:
town_price=town_price[town_price["max_price"]!= 0.0]
town_price
Out[87]:
town max_price number_of_times
32 Edinburgh 7.00 137
151 Livingston 10.50 109
115 Edinburgh 165.00 92
68 Edinburgh 31.95 53
72 Edinburgh 33.95 53
... ... ... ...
82 Edinburgh 41.25 1
83 Edinburgh 41.80 1
85 Edinburgh 43.45 1
130 Glentress 71.00 1
92 Edinburgh 54.00 1

116 rows × 3 columns

In [88]:
fig = px.bar(town_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Town")
fig.show()
In [89]:
town_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[89]:
max_price number_of_times
town
Edinburgh 5735.41 702
Glentress 367.00 6
South Queensferry 195.00 2
Lochgelly 123.00 2
Linlithgow 66.25 2
Berwick-upon-Tweed 51.36 1
Lauder 51.36 1
Musselburgh 50.00 1
Livingston 46.00 134
Crail 42.00 6
Dalkeith 38.50 1
St Monans 25.00 2
Falkland 24.00 1
Anstruther 17.00 5
Galashiels 12.00 1
Bathgate 7.70 1
Kelso 7.00 3

6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews

6.4.1 Frequency of Price Tickets per Scottish City

In [90]:
scot_towns_price=town_price[town_price['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [91]:
scot_towns_price[0:10]
Out[91]:
town max_price number_of_times
32 Edinburgh 7.00 137
115 Edinburgh 165.00 92
68 Edinburgh 31.95 53
72 Edinburgh 33.95 53
69 Edinburgh 32.00 34
108 Edinburgh 95.00 30
107 Edinburgh 91.00 30
71 Edinburgh 33.00 25
38 Edinburgh 10.50 15
93 Edinburgh 55.00 14
In [92]:
fig = px.bar(scot_towns_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Scottish City")
fig.show()
In [93]:
scot_towns_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[93]:
max_price number_of_times
town
Edinburgh 5735.41 702

6.4.2 Frequency of Type Tickets per Scottish City

In [94]:
scot_towns_type=town_type[town_type['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [95]:
scot_towns_type[0:10]
Out[95]:
town type number_of_times
52 Edinburgh Standard 15077
45 Edinburgh Concession 6166
146 St Andrews Standard 563
43 Edinburgh Children 537
48 Edinburgh Members 99
145 St Andrews Concession 76
44 Edinburgh Children 10+ 27
50 Edinburgh Preview 23
144 St Andrews Children 17
53 Edinburgh Students 14
In [96]:
fig = px.bar(scot_towns_type, x='town', y='number_of_times', color='type', barmode='group', title="Frequency of Type Tickets per Scottish City")
fig.show()
In [97]:
scot_towns_type.groupby(["town"]).sum()
Out[97]:
number_of_times
town
Edinburgh 22004
St Andrews 658
In [98]:
df_place.loc[0]
Out[98]:
event_id                                                     232545
event_name                                              Bright Club
descriptions_x                                                  NaN
event_tags        [Comedy, Days out, Glasgow City of Science, Sc...
place_id                                                          1
start_ts                                  2020-06-23T20:30:00+01:00
end_ts                                    2020-06-23T20:30:00+01:00
0                                                               NaN
currency                                                        GBP
description                                                     NaN
max_price                                                       0.0
min_price                                                       5.0
type                                                       Standard
address                                                5 York Place
email                                          admin@thestand.co.uk
postal_code                                                 EH1 3EB
properties        {'place.child-restrictions': True, 'place.faci...
sort_name                                                     Stand
town                                                      Edinburgh
website                                   http://www.thestand.co.uk
modified_ts                                    2021-11-24T12:18:33Z
created_ts                                     2021-11-24T12:18:33Z
name                                                      The Stand
loc               {'latitude': '55.955806109395006', 'longitude'...
country_code                                                     GB
tags                  [Bar & pub food, Comedy, Restaurants, Venues]
descriptions_y    [{'type': 'description.list.default', 'descrip...
phone_numbers     {'info': '0131 558 7272', 'box_office': '0131 ...
status                                                         live
Name: 0, dtype: object

6.4.3.3 Frequency of Schedules Dates per Event and per Scottish City

In [99]:
df_place2=df_place.dropna(subset=['town'])
df_place2
df_scott=df_place2[df_place2['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
df_scott=df_scott[["event_id", "event_name", "event_tags", "town", "start_ts", "end_ts"]]
df_scott[0:3]
Out[99]:
event_id event_name event_tags town start_ts end_ts
0 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
1 232545 Bright Club [Comedy, Days out, Glasgow City of Science, Sc... Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
2 347164 The Saturday Show [Comedy, Stand-up, The Saturday Show] Edinburgh 2020-05-02T20:30:00+01:00 2020-07-25T20:30:00+01:00

Note: An event can have several schedules. And a schedule has an starting and end date. Therefore, an event can have several starting and end dates.

In [100]:
fig = px.scatter(df_scott, x='start_ts', y="event_name", title="Frequency of starting date per event in Scottish cities")
fig.show()
In [101]:
fig = px.scatter(df_scott, x='end_ts', y="event_name", title="Frequency of ending date per event in Scottish cities")
fig.show()

6.4.4 Grouping Schedules per Event and Scottish City

In [102]:
scott_schedule=df_scott.groupby(['event_name', 'town']).size().reset_index()
scott_schedule=scott_schedule.rename(columns={0: "number_of_times"})
scott_schedule=scott_schedule.sort_values(by=['number_of_times'], ascending=False)
scott_schedule
Out[102]:
event_name town number_of_times
296 Charity Garden Opening - Hunter's Tryst Edinburgh 370
30 A Model Education Edinburgh 368
290 Charity Garden Opening - 101 Greenbank Crescent Edinburgh 308
400 Eastern Encounters: Four Centuries of Painting... Edinburgh 272
1109 Silent Disco Tours by Silent Adventures Edinburgh 202
... ... ... ...
1092 Seth Walker Edinburgh 1
497 Films For Students: The Bourne Legacy St Andrews 1
498 Films For Students: The Bourne Supremacy (12) St Andrews 1
1088 Screening: West Side Story Edinburgh 1
1501 [POSTPONED] Dawn Chorus Edinburgh 1

1502 rows × 3 columns

In [103]:
t=scott_schedule.groupby(["event_name"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[103]:
number_of_times
event_name
Charity Garden Opening - Hunter's Tryst 370
A Model Education 368
Charity Garden Opening - 101 Greenbank Crescent 308
Eastern Encounters: Four Centuries of Paintings and Manuscripts from the Indian Subcontinent 272
Silent Disco Tours by Silent Adventures 202
... ...
Ladysmith Black Mambazo 1
Language and Landscape: Holyrood Park 1
Lankum 1
Lau Unplugged 1
[POSTPONED] Dawn Chorus 1

1494 rows × 1 columns

In [104]:
fig = px.bar(t, title="Frequency of Schedules per event")
fig.show()

6.4.5 Exploring Tags per Schedule and Scottish Cities.

In [105]:
a=df_scott.reset_index(drop=True)
tags_town=a[["event_tags", "town"]]
tags_town=tags_town.explode("event_tags")
tags_town
Out[105]:
event_tags town
0 Comedy Edinburgh
0 Days out Edinburgh
0 Glasgow City of Science Edinburgh
0 Science Edinburgh
0 Stand-up Edinburgh
... ... ...
24743 Folk Edinburgh
24743 Rock & Pop Edinburgh
24744 Music Edinburgh
24744 Folk Edinburgh
24744 Rock & Pop Edinburgh

62771 rows × 2 columns

In [106]:
scott_tag=tags_town.groupby(['town', 'event_tags']).size().reset_index()
scott_tag=scott_tag.rename(columns={0: "number_of_times"})
scott_tag=scott_tag.sort_values(by=['number_of_times'], ascending=False)
scott_tag
Out[106]:
town event_tags number_of_times
71 Edinburgh Comedy 7382
343 Edinburgh Visual art 5230
325 Edinburgh Theatre 5155
114 Edinburgh Exhibitions 3796
92 Edinburgh Days out 3571
... ... ... ...
280 Edinburgh Scottish Cup 1
278 Edinburgh Scotland 1
140 Edinburgh Football 1
108 Edinburgh Electro/Electronic 1
0 Edinburgh 60s 1

411 rows × 3 columns

In [107]:
fig=px.histogram(scott_tag, x="town", y="number_of_times", histfunc="sum", color="event_tags", title='Frequency of tags in Scottish Cities')
fig.update_layout(legend_traceorder="reversed")
fig.show()
In [108]:
t=scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[108]:
number_of_times
event_tags
Comedy 7390
Visual art 5267
Theatre 5168
Exhibitions 3796
Days out 3618
... ...
Hard Dance 1
Guitar 1
Green Day 1
Glam rock 1
Kinky Boots 1

373 rows × 1 columns

6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh

In [109]:
edi_scott_tag=scott_tag[scott_tag['town'].isin(["Edinburgh"])]
edi_scott_tag
Out[109]:
town event_tags number_of_times
71 Edinburgh Comedy 7382
343 Edinburgh Visual art 5230
325 Edinburgh Theatre 5155
114 Edinburgh Exhibitions 3796
92 Edinburgh Days out 3571
... ... ... ...
280 Edinburgh Scottish Cup 1
278 Edinburgh Scotland 1
140 Edinburgh Football 1
108 Edinburgh Electro/Electronic 1
0 Edinburgh 60s 1

370 rows × 3 columns

In [110]:
edi_scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
Out[110]:
number_of_times
event_tags
Comedy 7382
Visual art 5230
Theatre 5155
Exhibitions 3796
Days out 3571
... ...
Hard Trance 1
Hard Dance 1
Guitar 1
Green Day 1
Kinky Boots 1

370 rows × 1 columns

In [111]:
fig = px.bar(edi_scott_tag, x='town', y='number_of_times', color='event_tags', barmode='group', title="Frequency of schedules tags for Edinburgh")
fig.show()

6.4.6 Histograms of starting/end schedules dates for Edinburgh

In [112]:
scott_start=df_scott.groupby([pd.to_datetime(df_scott['start_ts']), "town"]).size().reset_index()
scott_start=scott_start.rename(columns={0: "number_of_times"})
scott_start=scott_start.sort_values(by=['number_of_times'], ascending=False)
scott_start.reset_index()
Out[112]:
index start_ts town number_of_times
0 3 2020-05-01 10:00:00+01:00 Edinburgh 3238
1 0 2020-05-01 00:00:00+01:00 Edinburgh 689
2 4 2020-05-01 10:30:00+01:00 Edinburgh 391
3 2 2020-05-01 09:30:00+01:00 Edinburgh 379
4 493 2020-08-05 19:30:00+01:00 Edinburgh 328
... ... ... ... ...
1247 64 2020-05-05 23:00:00+01:00 Edinburgh 1
1248 65 2020-05-06 10:00:00+01:00 Edinburgh 1
1249 66 2020-05-06 10:30:00+01:00 Edinburgh 1
1250 69 2020-05-06 14:00:00+01:00 St Andrews 1
1251 1251 2020-10-31 22:00:00+00:00 Edinburgh 1

1252 rows × 4 columns

In [113]:
ed_scott_start=scott_start[scott_start['town'].isin(["Edinburgh"])].reset_index()
ed_scott_start.groupby(["start_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_start, x='town', y='number_of_times', color='start_ts', barmode='group', title="Frequency of starting date schedules for Edinburgh")
#fig.show()
Out[113]:
index number_of_times
start_ts
2020-05-01 10:00:00+01:00 3 3238
2020-05-01 00:00:00+01:00 0 689
2020-05-01 10:30:00+01:00 4 391
2020-05-01 09:30:00+01:00 2 379
2020-08-05 19:30:00+01:00 493 328
... ... ...
2020-09-05 19:20:00+01:00 908 1
2020-09-05 13:00:00+01:00 907 1
2020-09-05 11:15:00+01:00 906 1
2020-09-04 19:30:00+01:00 903 1
2020-10-31 22:00:00+00:00 1251 1

1166 rows × 2 columns

In [114]:
scott_end=df_scott.groupby([pd.to_datetime(df_scott['end_ts']), "town"]).size().reset_index()
scott_end=scott_end.rename(columns={0: "number_of_times"})
scott_end=scott_end.sort_values(by=['number_of_times'], ascending=False)
scott_end.reset_index()
ed_scott_end=scott_end[scott_end['town'].isin(["Edinburgh"])].reset_index()
ed_scott_end.groupby(["end_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_end, x='town', y='number_of_times', color='end_ts', barmode='group', title="Frequency of ending date schedules for Edinburgh")
#fig.show()
Out[114]:
index number_of_times
end_ts
2020-10-31 10:00:00+00:00 1184 3178
2020-10-31 00:00:00+00:00 1181 530
2020-06-28 10:00:00+01:00 302 413
2020-10-25 10:00:00+00:00 1134 326
2020-09-27 00:00:00+01:00 951 308
... ... ...
2020-07-18 14:00:00+01:00 350 1
2020-07-18 13:00:00+01:00 349 1
2020-07-17 19:00:00+01:00 348 1
2020-07-16 23:00:00+01:00 347 1
2020-10-31 22:00:00+00:00 1225 1

1142 rows × 2 columns

In [115]:
fig = px.histogram(ed_scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Edinburgh")
fig.show()
In [116]:
fig = px.histogram(scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Scottish Cities")
fig.show()
In [117]:
fig = px.histogram(scott_end, x='end_ts', y="number_of_times", title="Histogram of Schedules Ending Dates for Scottish Cities")
fig.show()
In [118]:
fig = px.histogram(scott_end, x="end_ts", y="number_of_times", histfunc="sum", title="Histogram on Date Axes")
fig.update_traces(xbins_size="M1")
fig.update_xaxes(showgrid=True, ticklabelmode="period", dtick="M1", tickformat="%b\n%Y")
fig.update_layout(bargap=0.1)
fig.add_trace(go.Scatter(mode="markers", x=scott_end["end_ts"], y=scott_end["number_of_times"], name="daily"))
fig.show()

6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time

In [119]:
b=df_scott.reset_index(drop=True)
tag_town_time=b[["event_tags", "town", "start_ts", "end_ts"]]
tag_town_time=tag_town_time.explode("event_tags")
tag_town_time
Out[119]:
event_tags town start_ts end_ts
0 Comedy Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
0 Days out Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
0 Glasgow City of Science Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
0 Science Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
0 Stand-up Edinburgh 2020-06-23T20:30:00+01:00 2020-06-23T20:30:00+01:00
... ... ... ... ...
24743 Folk Edinburgh 2020-08-24T21:00:00+01:00 2020-08-24T21:00:00+01:00
24743 Rock & Pop Edinburgh 2020-08-24T21:00:00+01:00 2020-08-24T21:00:00+01:00
24744 Music Edinburgh 2020-08-24T21:00:00+01:00 2020-08-24T21:00:00+01:00
24744 Folk Edinburgh 2020-08-24T21:00:00+01:00 2020-08-24T21:00:00+01:00
24744 Rock & Pop Edinburgh 2020-08-24T21:00:00+01:00 2020-08-24T21:00:00+01:00

62771 rows × 4 columns

In [120]:
scott_tag_end=tag_town_time.groupby([pd.to_datetime(tag_town_time['end_ts']), "event_tags"]).size().reset_index()
scott_tag_end=scott_tag_end.rename(columns={0: "number_of_times"})
scott_tag_end=scott_tag_end.sort_values(by=['number_of_times'], ascending=False)


scott_tag_start=tag_town_time.groupby([pd.to_datetime(tag_town_time['start_ts']), "event_tags"]).size().reset_index()
scott_tag_start=scott_tag_start.rename(columns={0: "number_of_times"})
scott_tag_start=scott_tag_start.sort_values(by=['number_of_times'], ascending=False)
In [121]:
scott_tag_start
Out[121]:
start_ts event_tags number_of_times
44 2020-05-01 10:00:00+01:00 Visual art 2836
33 2020-05-01 10:00:00+01:00 Exhibitions 2557
37 2020-05-01 10:00:00+01:00 Painting & Drawing 1317
3 2020-05-01 00:00:00+01:00 Days out 689
6 2020-05-01 00:00:00+01:00 Gardens 678
... ... ... ...
2328 2020-08-18 20:00:00+01:00 Stand-up 1
2327 2020-08-18 20:00:00+01:00 Comedy 1
387 2020-05-10 05:00:00+01:00 Walks 1
390 2020-05-10 13:00:00+01:00 Days out 1
3876 2020-10-31 22:00:00+00:00 Pop 1

3877 rows × 3 columns

6.4.7.1 Frequency of schedules Starting Date in Scottish City

In [122]:
#fig = px.bar(scott_tag_start, x='event_tags', y='start_ts', color='number_of_times', barmode='group', title="Frequency of schedules tags per Scottish City")
#fig.show()

fig = px.scatter(scott_tag_start, x='start_ts', y='number_of_times', title="Frequency of schedules Starting Date in Scottish City.")
fig.show()

6.4.7.2 Frequency of schedules Ending Date in Scottish City

In [123]:
fig = px.scatter(scott_tag_end, x='end_ts', y='number_of_times', title="Frequency of schedules Ending Date in Scottish City.")
fig.show()

6.4.7.3 Scheduled tags and Starting Dates in Scottish City

In [124]:
fig = px.scatter(scott_tag_start, x='start_ts', y='event_tags', title="Scheduled Tags and Starting Dates in Scottish City.")
fig.show()

6.4.7.3 Scheduled Tags and Ending Dates in Scottish City

In [125]:
fig = px.scatter(scott_tag_end, x='end_ts', y='event_tags', title="Scheduled Tags and Ending Dates in Scottish City.")
fig.show()
In [ ]: